SQL Server 2008 R2 : Database Files and Filegroups (part 2)

7/7/2013 9:35:30 PM

Using Filegroups

All databases have a primary filegroup that contains the primary data file. There can be only one primary filegroup. If you don’t create any other filegroups or change the default filegroup to a filegroup other than the primary filegroup, all files will be in the primary file group unless specifically placed in another filegroup.

In addition to the primary filegroup, you can add one or more filegroups to the database, and a filegroup can contain one or more files. The main purpose of using filegroups is to provide more control over the placement of files and data on your server. When you create a table or index, you can map it to a specific filegroup, thus controlling the placement of data. A typical SQL Server database installation generally uses a single RAID array to spread I/O across disks and create all files in the primary filegroup; more advanced installations or installations with very large databases spread across multiple array sets can benefit from the finer level of control of file and data placement afforded by additional filegroups.

For example, for a simple database such as AdventureWorks, you can create just one primary file that contains all data and objects and a log file that contains the transaction log information. For a larger and more complex database, such as a securities trading system where large data volumes and strict performance criteria are the norm, you might create the database with one primary file and four additional secondary files. You can then set up filegroups so you can place the data and objects within the database across all five files. If you have a table that itself needs to be spread across multiple disk arrays for performance reasons, you can place multiple files in a filegroup, each of which resides on a different disk, and create the table on that filegroup. For example, you can create three files (Data1.ndf, Data2.ndf, and Data3.ndf) on three disk arrays, respectively, and then assign them to the filegroup called spread_group. Your table can then be created specifically on the filegroup spread_group. Queries for data from the table are spread across the three disk arrays, thereby improving I/O performance.

If a filegroup contains more than one file, when space is allocated to objects stored in that filegroup, the data is stored proportionally across the files. In other words, if you have one file in a filegroup with twice as much free space as another, the first file has two extents allocated from it for each extent allocated from the second file .

Listing 2 provides an example of using filegroups in a database to control the file placement of the customer_info table.

Listing 2. Using a Filegroup to Control Placement for a Table

CREATE DATABASE Customer
ON ( NAME='Customer_Data',
    FILENAME='C:\SQLData\Customer_Data1.mdf',
    SIZE=50,
    MAXSIZE=100,
    FILEGROWTH=10)
LOG ON ( NAME='Customer_Log',
    FILENAME='C:\SQLData\Customer_Log.ldf',
    SIZE=50,
    FILEGROWTH=20%)
GO

ALTER DATABASE Customer
 ADD FILEGROUP Cust_table
GO

ALTER DATABASE Customer
 ADD FILE
   ( NAME='Customer_Data2',
    FILENAME='G:\SQLData\Customer_Data2.ndf',
    SIZE=100,
    FILEGROWTH=20)
 TO FILEGROUP Cust_Table
GO

USE Customer
CREATE TABLE customer_info
(cust_no INT, cust_address NCHAR(200), info NVARCHAR(3000))
 ON Cust_Table
GO

The CREATE DATABASE statement in Listing 34.2 creates a database with a primary database file and log file. The first ALTER DATABASE statement adds a filegroup. A secondary database file is added with the second ALTER DATABASE command. This file is added to the Cust_Table filegroup. The CREATE TABLE statement creates a table; the ON Cust_Table clause places the table in the Cust_Table filegroup (the Customer_Data2 file on the G: disk partition).

The sys.filegroups system catalog view contains information about the database filegroups defined within a database, as shown in Table 2.

Table 2. The sys.filegroups System Catalog View
Column Name	Description
name	Name of the data space, unique within the database.
data_space_id	Data space ID number, unique within the database.
type	FG = Filegroup.
type_desc	Description of data space type: ROWS_FILEGROUP.
is_default	1 = This is the default data space. The default data space is used when a filegroup or partition scheme is not specified in a CREATE TABLE or CREATE INDEX statement. 0 = This is not the default data space.
filegroup_guid	GUID for the filegroup.
	NULL = PRIMARY filegroup.
log_filegroup_id	Not used; value is NULL.
is_read_only	1 = Filegroup is read-only. 0 = Filegroup is read/write.

The following statement returns the filename, size in megabytes (not including autogrow), and the name of the filegroup to which each file belongs:

SELECT
     convert(varchar(30), sf.name) as filename,
     size/128 as size_in_MB,
     convert(varchar(30), sfg.name) as filegroupname
 FROM sys.database_files sf
 INNER JOIN sys.filegroups sfg
 ON sf.data_space_id = sfg.data_space_id
go

filename                       size_in_MB  filegroupname
------------------------------ ----------- -------------------------
Customer_Data                  50          PRIMARY
Customer_Data2                 100         Cust_table

FILESTREAM Filegroups

FILESTREAM storage is a new feature in SQL Server 2008 for storing unstructured data, such as documents, images, and videos. FILESTREAM storage helps to solve the issues with using unstructured data by integrating the SQL Server Database Engine with the NTFS file system for storing the unstructured data, such as documents and images, on the file system with the database storing a pointer to the data. Although the actual data resides outside the database in the NTFS file system, you can still use Transact-SQL (T-SQL) statements to insert, update, query, and back up FILESTREAM data, while maintaining transactional consistency between the unstructured data and corresponding structured data with same level of security.

Note

To use FILESTREAM storage, you must first enable FILESTREAM storage at the Windows level as well as at the SQL Server instance level. You can enable FILESTREAM at the Windows level during installation of SQL Server 2008 or at any time using SQL Server Configuration Manager. After you enable FILESTREAM at the Windows level, you next need to enable FILESTREAM for the SQL Server instance. You can do this either through SQL Server Management Studio (SSMS) or via T-SQL.

After you enabled FILESTREAM for the SQL Server instance, you can enable it for a database by creating a FILESTREAM filegroup. You can do this when the database is created (or to an existing database) by adding a filegroup and including the CONTAINS FILESTREAM clause. Unlike regular filegroups, a FILESTREAM filegroup can contain only a single file reference, which is actually a file system folder rather than an actual file. The actual folder must not exist (although the path up to the folder must exist); SQL Server creates the filestream folder. For example, in Listing 3, the code adds a FILESTREAM filegroup called CustFSGroup and adds the folder G:\SQLData\custinfo_FS into the file group. This custinfo_FS folder is created by SQL Server in the G:\SQLData folder.

Listing 3. Using a Filegroup to Control Placement for a Table

ALTER DATABASE Customer
 ADD FILEGROUP Cust_FSGroup CONTAINS FILESTREAM

ALTER DATABASE Customer
 ADD FILE
   ( NAME=custinfo_FS,
     FILENAME = 'G:\SQLData\custinfo_FS')
     to FILEGROUP Cust_FSGroup
GO

If you look in the G:\SQLData\custinfo_FS folder, you should see a Filestream.hdr file and an $FSLOG folder. The Filestream.hdr file is a FILESTREAM container header file that should not be moved or modified.

As you can see in the example in Listing 3, for FILESTREAM files or file groups, unlike regular files, you do not specify size or growth information. No space is preallocated. The file and filegroup grow as data is added to tables that have been created with FILESTREAM columns.

As you create tables with FILESTREAM columns, a subfolder is created in the filegroup folder for each table. The filenames are GUIDs. Each FILESTREAM column created in the table results in another subfolder created under the table subfolder. The column subfolder name is also a GUID. At this point, there still are no actual files created. That happens after you start adding rows to the table. A file is created in the column subfolder for each row inserted into the table with a non-NULL value for the FILESTREAM column.